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Abstract 


We  consider  a controlled  birtli  and  death  process  that  moves  as 
follows.  Upon  reaching  a state  i a pair  of  birth-death  parameters 
(A,p)  Is  selected  from  a prescribed  set.  Tlien  the  process  remains 
in  the  state  1 until  a birth  or  death  occurs  according  to  these  para- 
meters, at  which  time  a pair  of  bir tli-deatli  parameters  is  again  selected. 

This  is  repeated  indefinitely.  A cost  of  c(X,p)  + h(i)  per  unit  time 
is  Incurred  for  selecting  (X,u)  when  the  process  is  in  state  i,  and  a 
reward  is  received  for  each  birth.  A policy  is  a rule  for  successively 
selecting  the  birth-deatii  parameters  as  a function  of  the  state  of  the 
process.  We  show,  under  some  weak  conditions,  that  there  exist  increasing 
optimal  policies  for  both  the  discounted  and  average  reward  criteria. 

This  means  that  it  is  optimal  to  increase  tiie  deaths  and  decrease  the 
births  as  the  state  of  the  process  increases.  We  show  how  to  compute 
such  an  optimal  policy  for  the  case  with  two  possible  birth-death  parameters. 
We  then  apply  our  results  to  the  optimal  control  of  the  arrival  and  service 
rates  in  an  M/M/1  queueing  process. 

Key  Words:  Markov  decision  processes,  birth  and  death  processes,  queueing 

processes,  stochastic  control,  monotone  optimal  policies. 


Optimal  Control  of 

Birth  and  Death  Processes  and  Queues 
by 


Richard  F.  Serfozo 
Syracuse  University 


1 . Introduction 

We  shall  study  a controlled  birth  and  death  process  that  moves  as 
follows.  Wlien  the  process  arrives  at  a state  i (a  nonnegative  integer) 
the  following  events  occur. 

(1)  A pair  (A  ,p  ) of  birth-death  parameters  is  selected  from  the  set 

{(A,, Pi),  ...,  (A  ,p  )}.  Think  of  (A  ,u  ) or  a e {l,...,m)  as  the  action 
11  mm  a a 

taken.  We  assume  that  A > A„  > ...  > A >0,  and  0 < y,  _"L  Po  5,  • • • 5.  P„> 

1 2—  m 1—2—  — m 

(2)  The  process  remains  in  state  i for  a random  time  which  has  an  expo- 
nential distribution  with  parameter 

A(i,a)  =(  A if  1 = 0 

I A + p if  i > 1. 

a a 

Then  the  process  jumps  to  a neighboring  state  according  to  the  transition 
probabilities 

q(i,a,l+l)  = A /(A  + p ),  q(i,a,i-l)  = P„/(^_  + P_)  when  i ^ 1, 

3 3 3 3 3 3 

and 

q(0,a,l)  = 1 when  i = 0. 

(3)  A cost  is  incurred  at  a rate  c(a)  + h(i)  per  unit  time,  during  the 

sojourn  in  state  i,  for  selecting  the  action  a and  being  in  state  i. 

We  assume  that  c(a)  is  nondecreasing,  and  h(i)  is  convex  nondecreasing 

with  h(0)  “0.  A reward  R is  also  received  if  the  process  jumps  to 
1+1:  the  R is  a nonnegative  reward  for  a birth. 
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This  series  of  events  is  repeated  Indefinitely.  Note  that  the  birth- 


death  parameters  are  selected  at  jump  times  of  the  process:  they 
cannot  be  changed  between  jumps. 

A policy  f for  successively  ciiooslng  the  birth-death  parameters  is 

defined  to  be  a mapping  from  the  state  space  10,1,...)  to  the  action 

space  {l,...,m},  with  the  interpretation  that  action  f(i)  is  taken  when 

the  process  is  in  state  1.  Kach  policy  f,  along  witli  a rule  for  starting 

the  process,  determines  a continuous  time  birth  and  death  process  whose 

birth-death  parameters  in  state  i are  , ,Ur  / j \ ) • We  let  Y and  T 

f(i)  f(i)  n n 

denote  the  n-th  state  of  the  process,  and  the  time  at  which  the  process 

jumps  to  state  Y , respectively.  The  action  in  state  Y is  a = f (Y  ) . 

^ n * n n n' 

The  discounted  reward  for  the  process  Is  given  by 

“ -bT 

»pi)  . E,(  r e 1 X^-  1), 

n=0 

where  6 > 0 is  a continuous  time  discount  factor  and  g (i,a),  the  dis- 
ci 

counted  gain  in  a sojourn.  Is 

T 

g„(i,a)  = E,(e  ^R6(Y,,i+l)  - / (c(a)  + h(i))e  '^'"dt  | Y =i,  a =a) 

P t 1 o o 

o 

= (R  - c(a)  - h(l))/(B  + x(0,a))  if  i = 0 

(A  R - c(a)  - l.(l))/(e:  + A(l,a))  if  1 > 1. 

Here  T^^  is  the  exponential  sojourn  time  in  state  i,  the  6(i,j)  = 1 or  0 

according  as  i =•  j or  1 j , and  a reward  p received  at  time  t has  a 
“fit. 

value  pe  . Similarly,  the  average  reward  for  the  process  is 

N 

t 

'I'fU)  - IJjn  t Ef(  1 Ko^^n’^'n^  I ^o  “ * 

t n”0 

where  N - sup(n:T  < t)  is  the  number  of  jumps  in  time  t. 
t n 

A policy  f*  is  called  R-discounted  optimal  if 
Wj^(i)  “ sup  W^(i)  for  all  1, 


and  f*  is  called  average  optimal  if 
i|»j.^(i)  = sup 

The  aim  is  to  find  such  optimai  policies. 

This  controlled  birth  and  death  process  is  a continuous  time  Markov 
decision  process  with  bounded  sojourn  rates  A + p . Such  continuous 

iX  3 

time  processes  are  equivalent  to  simpler  discrete  time  Markov  decision 
processes.  This  equivalence,  which  was  originally  used  by  Howard  and 
Velnott  (for  processes  with  finite  state  spaces)  and  more  recently  by 
Lippman  [5],  is  discussed  in  [9].  We  use  this  equivalence  herein  to 
show  that  the  controlled  birth  and  death  process  is  equivalent  to  the 
controlled  random  walk  that  we  studied  in  [10].  Then  applying  the  results 
in  [10],  we  show  that  there  exist  increasing  discounted  and  average 
optimal  policies  for  the  birth  and  death  process.  (We  use  Increasing 
herein  to  mean  nondecreasing.)  This  says  that  it  is  optimal  to  increase 
the  probability  of  deaths  and  to  decrease  the  probability  of  births  as 
the  state  of  the  process  Increases.  The  results  in  [10]  for  computing 
average  optimal  policies  also  apply  to  the  birth  and  death  process.  We 
illustrate  this  for  a two  action  problem  (when  m = 2). 

In  the  last  section  of  this  paper,  we  show  how  our  results  apply 
to  the  optimal  control  of  the  arrival  and  service  rates  of  M/M/l  queueing 
processes.  Some  of  the  results  herein  have  been  derived  by  different 
approaches  in  [1]  - [8],  [11]  and  [12].  Bibliographies  on  the  optimal 
control  of  queues  are  in  [4],  [11]  and  [12]. 

2.  Maln_^Resulj^ 

We  first  show  that  the  controlled  birth  and  death  p-ocess  is  equivalent 
to  a controlled  random  walk.  Next  we  establish  the  existence  of  increasing 
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discount  and  average  optimal  policies  for  the  birth  and  death  process. 

Then  we  show  how  to  compute  one  sucli  policy. 

We  shall  use  the  notation  introduced  above.  In  addition,  we  shall 
consider  a random  walk  on  the  nonnegative  integers  (as  in  [10])  which 
moves  as  follows.  Upon  arriving  at  a location  i the  following  events 
occur . 

(i)  A pair  of  probabilities  (p  ,q  ) are  selected  from  the  set 

3i 

{ (Pf.qi) , • • • . } . where 

p,  = /A,  q^  = Pi,/A  and  ^ + u for  1 k ^ m. 

(ii)  A reward  r^^(i,a)  is  received,  where 

(1)  r (l,a)  = ug  (i,a)(B  + A(i,a))A  ^ 

a p 

='  (R  - c(a)  - h(i))/(H  + A)  for  i = 0 

< 

. (X  R - c(a)  - h(i))/(3  + A)  for  i > 1, 
a 

and  a = A/(6  + A)  is  a discrete  time  discount  factor. 

(lii)  The  next  state  of  the  walk  is  determined  by  the  following  transition 
probabilities 

p(l,a,l+l)  = p , p(i,a,i)  = l-p-q^,  p(l,a,l-l)  = q for  i ^ 1, 

and 

p(0,a, 1)  = p^  and  p(0,a,0)  = 1~P^  for  i = 0. 

This  series  of  events  is  repeated  indefinitely. 

A policy  f for  this  random  walk  is  a function  from  the  state  space 

{0,1,...}  to  the  action  space  {l,...,m}.  (These  policies  are  the  same  as 

those  for  the  birth  and  death  process.)  A policy  f,  along  with  a rule 

for  starting  the  process,  determines  a random  walk  ^ 0},  where  the 

n-th  action  taken  is  (p  ,q  ) when  f(X  ) = a.  The  expected  discounted 

3d  n 

reward  for  this  process  is 
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■V 


Vj(l)  = Ej(  5:  u"r^^(X^,f(X^^))  I = i), 

n=0 

where 

a = A/(ii  + A) 

The  exists  and  < Vj.(i)  < since  ttie  r(i,a)  is  bounded  from  above. 
The  expected  average  return  for  tiie  process  is 

-I 

<t>^(i)  = Urn  n E^(  Z r^fX^^,  f(Xj^) ) ] X^  = 1)  . 

n-*co  k.=0 

Note  that  when  a = 1 in  (1)  the  E = 0,  and  so 

(2)  rj^(l,a)  = i (R  - c(a)  - h(i))A  ^ for  1 = 0 

(X^R  - c(a)  - h(i))A~^  for  i > 1. 

Di  and  average  optimal  policies  are  defined  as  before. 

.ollowing  result  asserts  that  the  birth  and  death  process  is 

equivalent  to  this  random  walk  in  the  sense  that  they  have  identical 

optimal  policies. 

Theorem  2.1.  A policy  is  E~discounted  optimal  for  the  birth  and  death 
process  if  and  only  if  it  is  a-dlscounted  optimal  for  the  random  walk. 

A policy  is  average  optimal  for  the  birth  and  death  process  if  and  only 
if  it  is  average  optimal  for  the  random  walk. 

Proof . Note  that  the  transition  probabilities  and  rewards  of  the  two 
processes  are  such  that 

A(i,a)q(l,a,.l)  a'^  if  i j 

1 - A(i,a)A"^  if  i = j, 

and 

r^(iia)  » gg(l,a)(E  + A(i,a))/(E  + A). 

Then  from  [9,  Theorem  1.1]  it  follows  that  if  f is  any  policy  then 

5 


E t. 


for  all  i. 


W^(l)  = Vj.(i)  aiKi  ti-j-(i)  = 

Thus  the  assertions  follow. 

Our  next  result  establishes  the  existence  of  increasing  optimal 
policies  for  our  birth  and  death  process.  An  increasing  policy  is  of 
the  form 

f(i)  = a ifi<i*'i 


where  0 = i,  < i~  < . . . < i < i , , 
1-^2  m m-H 


a+l’ 

= Under  this  policy,  if  the  process 


is  in  state  i and  i,  £ i ^ ihe  birth-death  parameters  (X  ,p  ) 

^ 3 T 3 3 


are  selected.  Since  X,  > ...  > X and  p,  < 

1 “ m 1 


. < p , then  the  selected 
m 


death  rate  Increases  as  the  state  i increases,  and  the  selected  birth 
rate  decreases  as  i increases.  in  other  words,  the  probability  of 
backward  movement  increases  as  the  state  i increases. 

Theorem  2.2.  There  exist  increasing  discounted  and  average  optimal 
policies  for  controlling  the  birth  and  death  process. 

Proof • Consider  the  controlled  random  walk  we  defined  above  for  discounted 
rewards.  It  satisfies  the  following  conditions: 

(3)  p,  > . . . > p , q,  < . . . < q and  p,  + q <1. 

(4)  r^'(l,l)  £ •••  <_  r^^' (i,m)  and  r^^'(i,l)  > rj^'(i+l,m)  for  all  1, 
where 

r^'(i,a)  = r^(i+l,a)  - r^(i,a)  =|  -[(l-X^)R  + h(l)]/(e  + A)  if  1 = 0 

I*  -(h(i+l)  - h(i))/(B  + A)  if  1 > 1. 
Then  by  [10,  Theorem  2.1]  it  follows  that  there  exists  an  increasing 
a-discounted  optimal  policy  for  controlling  the  random  walk.  This  policy 
is  also  B-optlmal,  according  to  Theorem  2.2,  for  controlling  the  birth  and 
death  process. 

Now  consider  the  controlled  random  walk  for  average  rewards 
(recall  (2)).  It  also  satisfies  (3)  - (4)  and 


Sr 


(5) 


lor  .ill  i. 


, 1)  > ...  • ( i ,m) 

Tlien  by  (10,  Tbeorora  b.LJ,  Lhorc  exists  an  increasin)’  average  oi'timal 

policy  for  tlie  random  walk.  By  Theorem  2.2,  this  policy  is  also  average 

optimal  for  the  birth  and  death  process. 

In  Theorem  2.2,  we  assumed  that  the  birth-death  parameters  are 

selected  at  each  jump  from  a set  l(A,p  ),...,  (>  , p ))  which  is  inde- 

11  mm 

pendent  of  the  state  of  the  process.  Suppose  instead,  that  when  the  pro- 
cess is  in  state  i,  then  a pair  of  birth-death  parameters  ( A ( i ,a) , p ( i ,a) ) 
is  selected  from  a set  I ( A ( i , 1) , p ( i , 1) ) , . . . , ( A ( i ,m) , p ( i ,m) ) ) . Assume  that 

(6)  A(i,i)  > ...  > A(i,m),  p(i,l)  ^ ...  ''  p(i,m), 

(7)  d'(i,l)  < ...  < d'(i,m)  < 0,  and  d'(i,D  > d’(i+l,m)  for  all  1, 
where 

d(i,a)  = p(i,a)  - A(i,a)  and  d'(i,a)  = d(i+l,a)  - d(i,a) 

This  controlled  birth  and  death  process  is  equivalent  to  the  random  walk 
in  Section  J of  [10].  The  proof  of  this  is  the  same  as  that  for  Theorem 
2.1.  From  this  equivalence,  along  with  [10,  Theorem  3.1]  and  an  average 
reward  analog  of  it,  it  follows  that  Theorem  2.2  holds.  This  is  also 
discussed  in  [1]  just  for  discounted  rewards.  Other  analogs  of  Theorem  2.2 
for  decreasing  policies,  or  for  finite  time  horizons,  can  be  obtained  in 
the  same  way  from  Theorem  2.3  or  Theorem  9.i  in  [10]. 

The  results  in  [10]  on  the  computation  of  average  optimal  policies 
also  apply  to  birth  and  death  processes.  We  illustrate  this  for  a 
special  case. 

Theorem  2.4.  Suppose  the  birth  and  death  process  has  two  possible 
parameter  pairs  (A^.p^^)  and  (A2,M2^  that  the  reward  for  a sojourn  in 

state  1 is 

g^(l,a)  = (-c(a)  - ih)/A(l,a)  for  all  a. 
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where  h > 0.  Then  it  is  average  i/[)timal  to  select  ( A ^ , (i  j ) when  the  state 

of  the  process  is  below  n*  aiul  to  use  (A.^.p^)  otherwise.  Here  n*  is  the 

smallest  integer  tor  which  1)^^  > 0,  where 

I)  =i  n + In,"  + b - o.,(c(2)  - c ( I ) ) / (li(i- , -p,J  ( I-l'n)  ) if  P,  J*  1 
nil  2 122  l 

1 n*^  4 n(l+i  - 2.  ^(c(2)  - c ( 1 ) ) / (h  (i  ^-P2>  ) if  Pj^  = 1 

and  , = A /p  and  b = (i  , ( 1-p  , ) ( l-(  ,J  ) . 

Furthermore 

n*  < ( u„(c(2)  - c(l))/(h(p  -p.J(L-o,.))  if  p,  ¥ 1 
j 2 J Z 2 1 

) I2p^(c(2l  - c(l))/(h(Pj-P2))  if  = 1- 

Proof . The  policy  described  in  this  theorem  is  average  optimal,  by 
[10,  Corollary  7.2J,  for  the  random  walk.  Thus,  by  Theorem  2.2,  it  is 
average  optimal  for  the  birth  and  death  process. 

3.  Optimal  Control  of  Arrival  and  Service  Rates  in  an  M/M/l  (Jueue 

The  following  are  examples  of  controlled  birth  and  death  processes 

which  have  monotone  optimal  policies  as  we  discussed  above. 

M/M/l  (jueue  with  a Controlled  Service  Rate.  Suppose  an  M/M/l  queue  has 

a fixed  arrival  r.ate  A and  its  service  rate  is  controlled  as  follows. 

At  each  service  completion  or  customer  arrival,  the  number  of  customers 

in  the  system  is  observed.  Based  on  this  number  a service  rate  p is 

a 

selected  from  the  set  {p,,...,p  },  where  the  p's  are  subscripted  so  that 

i m 

0 < p,  < . . . < p . A cost  c(a)  per  unit  time  is  charged  for  using  p , 
i m a 

and  a cost  h(i)  per  unit  time  is  charged  for  holding  1 customers  in  the 
system.  A reward  R is  also  received  from  each  customer.  We  assume  that 
c(a)  is  Increasing,  and  h(i)  is  convex  increasing  and  h(0)  = 0.  This  is 
a controlled  birth  and  death  process  as  in  Theorem  2.2  with  birth-death 
parameters  { (A.p^^) , . . . , (A,p^) } . Thus  it  is  optimal  (for  both  discounted 
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and  average  rewards)  to  increase  tlie  service  rate  as  tiie  number  of  cus- 
tomers Increases.  This  was  proved  in  [3 | , and  similar  results  for 
finite  lengtli  (pieues  are  di.scussed  in  [2],  |8|  and  some  of  the  references 
in  1 4 1 . 

^N/lj^ijeue  with  a Controlled  Arrival  Kate.  Suppose  an  M/M/ L queue  has  a 
fixed  service  rate  p , and  the  arrival  rate  1 is  selected  from  a set 

ii 

{A,,..., A },  where  A • ...  > A > 0,  at  each  service  completion  or 
i m 1 m 

customer  arrival.  Costs  c(a)  and  li(i)  are  incurred  as  above,  and  a 
reward  R is  received  from  eacdi  customer.  Then  by  Tlieorem  2.2  it  is 
optimal  to  decrease  tlie  arrival  rate  as  the  number  of  customers  increases. 
Ibis  was  proved  in  [7J:  see  [6]  for  finite  queue  lengths. 

M/M^  1_  (ji^eue  with  Controlled  Ari^ivjil  and  Scr^vixe_R^tes . Suppose  in  an  M/M/1 
queue  that  tlie  arrival  and  service  rate  pair  (A  ,u)  is  selected,  at  each 

3 ci 

service  completion  and  customer  arrival,  from  a set  f (A,  ,p, ) , . . . , (A  ,p  )} 

11  mm 


where 

Al  " 

. . > 

A • 0 and  0 
m 

“i  ••• 

u • 

m 

With  the 

costs  c(a) 

and 

h ( i ) , 

and  reward 

R,  as  above. 

it 

is  optimal 

to 

increase 

the  service 

rate 

and  decrease 

the 

arrival  rate 

as 

the  number 

of 

customers 

increases. 
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